Introduction

With skyrocketing rents and home values across the Bay Area, there has been much discussion about housing affordability (or lack thereof), for both renters and homeowners paying a mortgage alike. In this study, I will examine two aspects of housing in the Bay Area: the existing problem of un-affordability, assessed by the proportion of cost-burdened households in Santa Clara Countym and then the potential for future growth to accommodate the demand for housing, examined in one of San Francisco’s most popular neighborhoods.

Part 1 | Housing Burden

Santa Clara County is home to much of Silicon Valley and has experienced a dramatic increase in housing costs over the last two decades. However, it’s important to recognize that these increased costs are not felt equally by every population. I’ll the American Community Survey’s Public Use Microdata Sample (PUMS) to better understand the home burden, a detailed survey that has specific responses rather than geographically-aggregated results (the former of which can be harder to draw strong conclusions from).

First, I will extract the geography for Santa Clara County and filter the PUMS sample to only those living within the county. Next, I download the variables of interest, housing costs and household income along with two other variables of interest: the presence and age group of children in the household and the language(s) predominantly spoken at home. This sample also includes a weighting factor to take the limited number of respondents and extrapolate their responses to be (theoretically) representative of the county as a whole.

# Get geography data for Santa Clara County
sc_county_name <- c("Santa Clara")

# Santa Clara County
sc_county <-
  counties("CA", cb = T, progress_bar = F) %>%
  filter(NAME %in% sc_county_name)

ca_pumas <-
  pumas("CA", cb = T, progress_bar = F)

sc_pumas <-
  ca_pumas %>% 
  st_centroid() %>% 
  .[sc_county, ] %>% 
  st_set_geometry(NULL) %>% 
  left_join(ca_pumas %>% select(GEOID10)) %>% 
  st_as_sf()
saveRDS(sc_pumas, "sc_pumas.rds")

# Get income & cost data for SCC
pums_vars_2018 <- 
  pums_variables %>%
  filter(year == 2018, survey == "acs5")

ca_pums <- get_pums(
  variables = c(
    "PUMA",   # public use microdata area
    "HUPAC",  # HH presence and age of children
    "HHL",    # Household language
    "GRNTP",  # gross monthly rent
    "SMOCP",  # selected monthly owner costs
    "ADJHSG", # adjustment factor for housing dollar amounts
    "HINCP",  # household income for the past 12 months
    "ADJINC"  # adjustment factor for income dollar amounts
  ),
  state = "CA",
  year = 2018,
  survey = "acs5"
)

# Save it to prevent loading later
saveRDS(ca_pums, "ca_pums.rds")

Next, I calculate the housing burden. A broadly-accepted definition of housing affordability is 30% of one’s income, and I will consider the same threshold to apply to renters and homeowners alike. Housing costs are aggregated in the SMOCP and the GRNTP variables for homeowners and renters, respectively. Housing costs and incomes are also adjusted so that all values are in 2018 dollars. I first calculate this cost burden for all households in Santa Clara County, regardless of other socioecoomic factors. Once the number of households burdened and the total dollar of that amount of burden is calculated, they’re summarized by Public Use Microdata Area (PUMA) and scaled by the weight factor to be representative of the entire county.

# Load PUMS data
ca_pums <- readRDS("ca_pums.rds")
sc_pumas <- readRDS("sc_pumas.rds")

# Filter to Santa Clara County
sc_pums <-
  ca_pums %>% 
  filter(PUMA %in% sc_pumas$PUMACE10)

# Calculate housing cost burden as a percentage & absolute dollar amount
burden_threshold <- 0.3

# For everyone
sc_burden_all <-
  sc_pums %>% 
  filter(HINCP > 0) %>% # household income reported as >0
  filter(SPORDER == 1) %>% # individual people
  transmute(
    PUMA = PUMA,
    weight = WGTP,
    housingcost = ifelse(
      SMOCP > 0,
      SMOCP*12*as.numeric(ADJHSG), # owners
      GRNTP*12*as.numeric(ADJHSG)  # renters
    ),
    income = HINCP*as.numeric(ADJINC), # adjust income
    burden_perc = housingcost/income,  # calculate burden as a percentage
    burden_30 = housingcost - burden_threshold*income, # how many dollars > 30% income are housing costs?
    incomegap_30 = housingcost/burden_threshold - income, # how much more would you have to make?
    incomegap_perc = (housingcost/burden_threshold - income)/income # what % more would you have to make?
  )

# Summarize the results for each PUMA
sc_burden_pumas <-
  sc_burden_all %>% 
  mutate(
    burdened_30 = ifelse( # number of people spending >= 30% of income on housing
      burden_perc >= burden_threshold, 
      weight,
      0
    ),
    excess_30 = ifelse( # housing cost below their threshold
      burden_30 < 0, 
      burden_30,
      0
    ),
    burden_30 = ifelse( # housing cost above their threshold
      burden_30 > 0,
      burden_30,
      0
    ),
    incomegap_30 = ifelse( # how much more money would people need to make?
      incomegap_30 > 0,
      incomegap_30,
      0
    )
  ) %>% 
  group_by(PUMA) %>% 
  summarize(
    burdened_30 = sum(burdened_30), # number of HH spending >= 30% of income on housing
    households = sum(weight), # number of households overall
    burden_30 = sum(burden_30*weight), # total "excess" housing cost
    incomegap_30 = sum(incomegap_30*weight), # total "missing" income
    excess_30 = sum(excess_30*weight)
  ) %>% 
  mutate(
    burdened_30_perc = burdened_30/households # percent of HH that are cost-burdened
  ) %>% 
  left_join(sc_pumas %>% select(PUMA = PUMACE10)) %>% # merge datasets
  st_as_sf()

Given this information, a few meaningful results can immediately be summarized. The percentage of Santa Clara County residents that pay more than 30% of their income on housing can be calculated by:

# Percent of SCC housholds paying more than 30% of their income on housing
sum(sc_burden_pumas$burdened_30)/sum(sc_burden_pumas$households)
## [1] 0.3512912

Therefore, approximately 35.1% of households in Santa Clara County are considered housing cost-burdened and pay more than 30% of their monthly income on housing costs. The total amount of this burden (how much funding might be required to balance the housing burden) is calculated by:

 # Amount of federal funding needed to make up the difference
sum(sc_burden_pumas$burden_30)
## [1] 2674114821

This value is staggering: a cumulative housing burden of $2.7 billion for a county of 629,011 households. But how much of this value accounts for the housing burden of those who are most vulnerable in society? To analyze this, I repeat the previous analysis once more, but this time filtering for just those households with young children. These households might include younger parents struggling with many living expenses. Using the HUPAC variable, I filter the dataset down to values of 1 or 3, which include any households with at least one child younger than 6 years old.

After repeating the prior analysis, I again calculate the proportion of households with at least one child <6 years old who are housing cost-burdened:

# Percent of SCC households with at least one child <6 years old paying more than 30% of their income on housing
sum(sc_burden_pumas$burdened_30)/sum(sc_burden_pumas$households)
## [1] 0.3333026

It appears that the proportion is roughly the same (and in fact, slightly lower) that the entire county: 33.3% of households. This means that the presence of young children doesn’t appear to be correlated with any increased housing burden in Santa Clara County.

Next, I investigate another socioeconomic factor: households that don’t speak English primarily at home. By repeating the same analysis once more, but now filtering for households where the household language variable, HHL is not “English only”, I find that the proportion of cost-burdened households is 36.6%, again similar to the overall county’s. But is this the same for all languages? I repeat the analysis for those households which speak Spanish at home:

# Percent of SCC households speaking Spanish at hone and paying more than 30% of their income on housing
sum(sc_burden_pumas$burdened_30)/sum(sc_burden_pumas$households)
## [1] 0.4748164

Therefore, 47.5% of Spanish-speaking households in Santa Clara County are housing cost-burdened! This shows that while not speaking English primarily at home may not be associated with any substantial increase in housing cost burden, speaking Spanish specifically is. Therefore, the county might consider how to create initiatives specifically for Spanish-speaking residents; for example, distributing materials advertising low-cost housing in Spanish or specifically allocating a proportion of its affordable housing for Latinx residents.

The last group I’ll consider are those that make less than the Area Median income in San Jose. In this way, I will try to account for just those whose housing cost burden may reflect a larger challenge with living expenses, not the high-income households with expensive homes. I’ll again repeat the same analysis, but this time filtering for those that make less than the $116,178 median household income (averaged over 2014-2018, in 2018 dollars) as per the U.S. Census Bureau’s QuickFacts website for San Jose.

# Calculate housing cost burden as a percentage & absolute dollar amount
burden_threshold <- 0.3

# For those making less than AMI ($116,178 for SCC, as per the Census Bureau: https://www.census.gov/quickfacts/fact/table/santaclaracountycalifornia/INC110218)
scc_AMI <- 116178

sc_burden_AMI <-
  sc_pums %>% 
  filter(HINCP > 0) %>% # household income reported as >0
  filter(HINCP*as.numeric(ADJINC) < scc_AMI) %>% # those making less than 70% AMI
  filter(SPORDER == 1) %>% # individual people
  transmute(
    PUMA = PUMA,
    weight = WGTP,
    housingcost = ifelse(
      SMOCP > 0,
      SMOCP*12*as.numeric(ADJHSG), # owners
      GRNTP*12*as.numeric(ADJHSG)  # renters
    ),
    income = HINCP*as.numeric(ADJINC), # adjust income
    burden_perc = housingcost/income,  # calculate burden as a percentage
    burden_30 = housingcost - burden_threshold*income, # how many dollars > 30% income are housing costs?
    incomegap_30 = housingcost/burden_threshold - income, # how much more would you have to make?
    incomegap_perc = (housingcost/burden_threshold - income)/income # what % more would you have to make?
  )

# Summarize the results for each PUMA
sc_burden_pumas <-
  sc_burden_AMI %>% 
  mutate(
    burdened_30 = ifelse( # number of people spending >= 30% of income on housing
      burden_perc >= burden_threshold, 
      weight,
      0
    ),
    excess_30 = ifelse( # housing cost below their threshold
      burden_30 < 0, 
      burden_30,
      0
    ),
    burden_30 = ifelse( # housing cost above their threshold
      burden_30 > 0,
      burden_30,
      0
    ),
    incomegap_30 = ifelse( # how much more money would people need to make?
      incomegap_30 > 0,
      incomegap_30,
      0
    )
  ) %>% 
  group_by(PUMA) %>% 
  summarize(
    burdened_30 = sum(burdened_30), # number of HH spending >= 30% of income on housing
    households = sum(weight), # number of households overall
    burden_30 = sum(burden_30*weight), # total "excess" housing cost
    incomegap_30 = sum(incomegap_30*weight), # total "missing" income
    excess_30 = sum(excess_30*weight)
  ) %>% 
  mutate(
    burdened_30_perc = burdened_30/households # percent of HH that are cost-burdened
  ) %>% 
  left_join(sc_pumas %>% select(PUMA = PUMACE10)) %>% # merge datasets
  st_as_sf()
# Percent of SCC households making < AMI paying more than 30% of their income on housing
sum(sc_burden_pumas$burdened_30)/sum(sc_burden_pumas$households)
## [1] 0.6089261

As I expect (but am disheartened to see), the proportion of those households making less than the AMI are housing cost-burdened at a much higher rate than the overall population: 60.9%. Though housing affordability is an issue for all residents of Santa Clara County, it clearly disproportionately affects lower-income households.

 # Amount of federal funding needed to make up the difference
sum(sc_burden_pumas$burden_30)
## [1] 2297901668

Furthermore, the total housing cost burden is still significant: 2.3 billion. This leads to the question – if Santa Clara County had a certain budget to apply to the housing affordability crisis, what would be the best way to apply it? For the sake of this study, I will assume that the budget is $1B – less than a third of the the total housing burden, but sufficient to make a meaningful impact on the problem.

I decided to first explore those making less than the Area Median Income that are “severely” cost-burdened; that is, spending >50% of their income on housing. I found that 31.4% of those making less than AMI are severely cost-burdened, and that the total cost burden is still over the budget, at $1.16 billion. After iterating a few times, I found that of those households making less than 50% of AMI, 52.83% are severely housing cost-burdened, and that the total housing burden is $1.005 billion, which is in line with the proposed budget. Therefore, I propose that if Santa Clara County were to address the housing affordability problem via direct financial aid, it should start with the severely cost-burdened households making less than half of the AMI.

The location of where this proportion is higher and lower throughout the county is visualized below:

The distribution of the low-income (< 50% AMI) housing burden can also be visualized geospatially by total dollars:

The maps above indicate that while the Cupertino area and parts of San Jose have a higher proportion of those who are cost-burdened, the Mountain View area has the highest amount of total cost burden. Therefore, this area’s housing must be substantially more expensive, on average, than in surrounding areas. I conclude this portion with a few observations:

Part 2 | Land Use

Clearly, housing affordability remains a significant challenge in Santa Clara County. Much of this is likely driven by Silicon Valley and the influx of well-paid tech workers who drive up housing costs for all residents. However, another popular area for young tech workers to live in is the Mission District in San Francisco. This neighborhood has a rich history of Latinx culture, a population who grew in the neighborhood beginning in the 1960s. However, it has been steadily gentrified over the last two decades and the Latinx population has dramatically dropped while the White population has steadily risen. Gentrification is another effect of San Francisco’s and tech’s rapid growth. However, for the moment I will set its complex challenges and implications aside and instead ask: does the increasingly-popular Mission District have space for additional housing growth?

To address this question, I first extract the active parcel shapes for all of San Francisco from the city’s open data platform. Next, I extract property tax roll data from the Assessor-Recorder’s office. As these data sets cover the same areas but do not come from the same source, there are some issues when joining them together into a single dataframe, as I do by matching the APNs (Assessor’s Parcel Numbers). Not all of the parcels are matched perfectly, but for the purposes of this study, I will consider the dataset essentially complete.

# Set up packages
library(tidyverse)
library(readxl)
library(tigris)
library(sf)
library(leaflet)
library(mapboxapi)

# Mapbox API token
mb_access_token("pk.eyJ1IjoiYW5hbWMxMiIsImEiOiJja2dieGh5YTQwbGs1MnhydnI2eWs5MWM2In0.o2uka7AE2Lh7nl8d_hPl1A")
readRenviron("~/.Renviron")

# Read active parcel shapes
sf_parcels_shape <- 
  st_read("https://data.sfgov.org/api/geospatial/acdm-wktn?method=export&format=GeoJSON") %>% 
  filter(active == "true") %>% 
  select(
    apn = blklot,
    zoning = zoning_code,
    zoning_desc = zoning_district
  )

# Read the Assessor-Recorder secured property tax roll data
temp <- tempfile()
download.file("https://sfassessor.org/sites/default/files/uploaded/2020.7.10_SF_ASR_Secured_Roll_Data_2019-2020.xlsx",destfile = temp, mode = "wb")

sf_secured <- read_excel(temp, sheet = "Roll Data 2019-2020")
datakey <- read_excel(temp, sheet = "Data Key")
usecode <- read_excel(temp, sheet = "Class Code Only")

unlink(temp)

# Join the property tax data to the parcel shape dataframe by APN (Assessor's Parcel Number)
sf_parcels <-
  sf_parcels_shape %>% 
  left_join(
    sf_secured %>% 
      mutate(
        apn = RP1PRCLID %>% 
          str_replace(" ","")
      )
  )

Now that the parcel information is combined in a single data frame, I will filter just to the four census tracts that span much of the Mission’s main drags: Mission St and Valencia St. 

# Filter to area of interest -- focus on the Mission district to see how much more capacity it has
mission_sample <-
  tracts("CA", "San Francisco", cb = T, progress_bar = F) %>% 
  filter(
    TRACTCE %in% c(
      "020700",
      "020800",
      "021000",
      "020900"
    )
  ) %>% 
  st_transform(4326)

mission_parcels <- 
  sf_parcels %>% 
  st_centroid() %>% 
  .[mission_sample, ] %>% 
  st_set_geometry(NULL) %>% 
  left_join(sf_parcels %>% select(apn)) %>% 
  st_as_sf() %>% 
  filter(!is.na(RP1PRCLID))

# Save for future reference
saveRDS(mission_sample, "mission_sample.rds")
saveRDS(mission_parcels, "mission_parcels.rds")

The parcels I obtain can be visually verified below:

It seems that most of the parcels in the area are correctly extracted. The dark parcels are ones in which the parcel is repeated multiple times (likely due to condominiums on that parcel), but the duplicates will be ignored in the subsequent data cleaning process.

Next, I examine the zoning codes for the parcels:

# Check which zones are in the set of parcels
mission_parcels %>%
  st_set_geometry(NULL) %>% 
  group_by(zoning, zoning_desc) %>% 
  summarize(Freq = n())
## # A tibble: 16 x 3
## # Groups:   zoning [14]
##    zoning    zoning_desc                                                    Freq
##    <chr>     <chr>                                                         <int>
##  1 NC-1      NEIGHBORHOOD COMMERCIAL, CLUSTER                                 47
##  2 NC-2      NEIGHBORHOOD COMMERCIAL, SMALL SCALE                              1
##  3 NC-3      NEIGHBORHOOD COMMERCIAL, MODERATE SCALE                           7
##  4 NCT       24TH-MISSION NEIGHBORHOOD COMMERCIAL TRANSIT                      8
##  5 NCT       MISSION STREET NEIGHBORHOOD COMMERCIAL TRANSIT                  714
##  6 NCT       VALENCIA STREET NEIGHBORHOOD COMMERCIAL TRANSIT                 417
##  7 NCT|RTO-M VALENCIA STREET NEIGHBORHOOD COMMERCIAL TRANSIT|RESIDENTIAL …     1
##  8 P         PUBLIC                                                           16
##  9 RH-2      RESIDENTIAL- HOUSE, TWO FAMILY                                  129
## 10 RH-2|RH-3 RESIDENTIAL- HOUSE, TWO FAMILY|RESIDENTIAL- HOUSE, THREE FAM…    45
## 11 RH-3      RESIDENTIAL- HOUSE, THREE FAMILY                               1077
## 12 RH-3|RM-2 RESIDENTIAL- HOUSE, THREE FAMILY|RESIDENTIAL- MIXED, MODERAT…     1
## 13 RM-1      RESIDENTIAL- MIXED, LOW DENSITY                                 122
## 14 RM-2      RESIDENTIAL- MIXED, MODERATE DENSITY                            160
## 15 RTO-M     RESIDENTIAL TRANSIT ORIENTED- MISSION                          1159
## 16 UMU       URBAN MIXED USE                                                  14

As seen above, there are three instances of the “NCT” code, but they refer to different zones: one for the 24th Steet Mission Neighborhood Commercial Transit, one for another section of Mission St Neighborhood Commercial Transit, and one for Valencia Street Neighborhood Commerical Transit. The Mission’s great access to public transit (BART) is part of why it’s so desirable, but it also leads to more complex zoning than other parts of the city.

There are also three double-zoning parcel kinds. I will consider “RH-3 | RM-2” to be RM-2, “RH-2 | RH-3” to RH-3, and “NCT|RTO-M” to RTO-M. The first decision is based on matching the surrounding parcels’ zoning code, the second is made as RH-3 is a more flexible designation, and the last is also decided in order to match the neighboring parcels. Lastly, I will ignore the 16 public parcels for the sake of this housing study.

With these decisions made, I can clean the data accordingly and remove the duplicates.

# Data cleaning
mission_parcels_clean <-
  mission_parcels %>% 
  mutate(
    zoning = case_when(
      zoning == "RH-3|RM-2" ~ "RM-2",
      zoning == "RH-2|RH-3" ~ "RH-3",
      zoning == "NCT|RTO-M" ~ "RTO-M",
      zoning_desc == "24TH-MISSION NEIGHBORHOOD COMMERCIAL TRANSIT" ~ "24TH-MISSION",
      zoning_desc == "MISSION STREET NEIGHBORHOOD COMMERCIAL TRANSIT" ~ "MISSION",
      zoning_desc == "VALENCIA STREET NEIGHBORHOOD COMMERCIAL TRANSIT" ~ "VALENCIA",
      TRUE ~ zoning
    )
  ) %>% 
  filter(zoning != "P") %>% 
  as.data.frame() %>% 
  mutate(geometry = geometry %>% st_as_text()) %>% 
  group_by(geometry) %>% 
  summarize(
    apn = first(apn),
    zoning = first(zoning),
    units = sum(UNITS, na.rm = T),
    stories = max(STOREYNO, na.rm = T),
    floorarea = sum(SQFT, na.rm = T)
  ) %>% 
  ungroup() %>%
  select(-geometry) %>% 
  left_join(mission_parcels %>% select(apn)) %>% 
  st_as_sf()

# New zoning types
mission_parcels_clean %>%
  st_set_geometry(NULL) %>% 
  group_by(zoning) %>% 
  summarize(Freq = n())
## # A tibble: 12 x 2
##    zoning        Freq
##    <chr>        <int>
##  1 24TH-MISSION     8
##  2 MISSION        332
##  3 NC-1            38
##  4 NC-2             1
##  5 NC-3             2
##  6 RH-2           116
##  7 RH-3           806
##  8 RM-1            83
##  9 RM-2           120
## 10 RTO-M          923
## 11 UMU             12
## 12 VALENCIA       240

The cleaned data is now grouped into the zoning designations seen and summarized above. RTO-M, or “Residential Transit Oriented-Mission, is the most common, followed by RH-3,”Residential- House, three family." This indicates that the planners have indeed considered the Mission’s easy access to transit and popularity for housing in their zoning designation of the neighborhood.

Next, I search through the SF zoning code to determine a few important parameters for each zoning designation. The floor-area ratio, or FAR, indicates how much interior floor area can be constructed on a parcel given its physical land area. The maximum number of units is also an important parameter, often given by a number of units per square feet of lot area. The designations for each zoning code are as follows:

I note that these FAR’s are stated in the code as being for non-residential uses, but for the purpose of this study I will consider them to hold for residential as well.

For the other “specialty” zoning designations, there are a few nuances. Notably, the maximum number of units isn’t given for the 24th-Mission, Mission, and Valencia codes; instead, the code states “No residential density limit by lot area. Density restricted by physical envelope controls of height, bulk, setbacks, open space, exposure and other applicable controls of this and other Codes, as well as by applicable design guidelines, applicable elements and area plans of the General Plan, and design review by the Planning Department.”

Therefore, I will assume that the the density limit of those districts would be given as one unit per 600 square feet, as this is the densest designation of the other options, and I know these zoning designations to apply to denser development areas.

Lastly, the Urban Mixed Use code has nearly no designations easily-found in the code: no FAR, nor density limit. Since Urban Mixed Use implies relatively dense development, I will assume the densest values of the ones found in the other zoning types considered: a FAR of 3.6 to 1, and the maximum number of units given by 1/600 sq. ft. of lot area.

The last thing to consider that might restrict housing construction is the maximum height limit. For most of San Francisco, this is set at 40’. However, there are some exceptions: the RH zoning designations are restricted to 35’, and there are other special zones of designation. Below, I will load in the height districts to see which apply to the Mission District.

## Reading layer `h9wh-cg3m' from data source `https://data.sfgov.org/resource/h9wh-cg3m.geojson' using driver `GeoJSON'
## Simple feature collection with 1000 features and 2 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -122.5149 ymin: 37.70779 xmax: -122.357 ymax: 37.83062
## geographic CRS: WGS 84

As seen above, there are a number of special height districts that apply to the Mission, especially along Mission and Valencia streets, as we might expect. Therefore, I will account for these height limits, as well as the FAR’s and maximum number of units previously summarized in the data manipulations below.

# Calculate the max floor area, units, and stories along with unused versions of each
projection <- "+proj=utm +zone=10 +ellps=GRS80 +datum=NAD83 +units=ft +no_defs"

mission_parcels_zoning <-
  mission_parcels_clean %>% 
  st_centroid() %>% 
  st_join(mission_heights %>% select(gen_hght)) %>% 
  st_set_geometry(NULL) %>% 
  left_join(mission_parcels_clean %>% select(apn)) %>% 
  st_as_sf() %>% 
  st_transform(projection) %>% 
  mutate(
    lotarea = st_area(.) %>% as.numeric(),
    max_floorarea = case_when(
      zoning %in% c("MISSION", "NC-3", "UMU") ~ lotarea*3.6,
      zoning %in% c("24TH-MISSION", "NC-2", "VALENCIA") ~ lotarea*2.5,
      zoning %in% c("NC-1","RH-2","RH-3","RM-1","RM-2","RTO-M") ~ lotarea*1.8
    ),
    unused_floorarea = ifelse(
      (max_floorarea - floorarea) > 0,
      (max_floorarea - floorarea),
      0
    ),
    max_units = case_when(
      zoning %in% c("NC-1", "NC-2") ~ floor(lotarea/800),
      zoning %in% c("NC-3", "RTO-M", "MISSION", "24TH-MISSION", "VALENCIA", "UMU") ~ floor(lotarea/600), # assumption b/c highest density
      zoning == "RH-2" ~ 2,
      zoning == "RH-3" ~ 3,
      zoning == "RM-1" ~ pmax(3, floor(lotarea/800)),
      zoning == "RM-2" ~ pmax(3, floor(lotarea/600))
    ),
    unused_units = ifelse(
      (max_units - units) > 0,
      (max_units - units),
      0
    ),
    max_height = ifelse(
      is.na(gen_hght),
      40,
      gen_hght %>% as.numeric()
    ),
    max_stories = floor(max_height/11),
    unused_stories = ifelse(
      (max_stories - stories) > 0,
      (max_stories - stories),
      0
    )
  ) %>% 
  st_transform(4326) %>%
  filter(gen_hght <81) # This is done to remove two outlier parcels that have a height of much more than 80 stories

After completing these data manipulations and calculating the unused amount of floor area, units, and stories, I can summarize some results for the entire neighborhood below:

sum(mission_parcels_zoning$unused_floorarea, na.rm = T)
## [1] 5828123
sum(mission_parcels_zoning$unused_units, na.rm = T)
## [1] 3983

Therefore, the Mission District has a whopping 5.8M square feet of unused potential floor area, and as many as 3,983 unused units, assuming that all the available space is used for residential housing. With that said, this analysis has not taken into account that all the ground floor space must be used for commercial businesses in the all of the considered portion of Mission St and parts of22nd St as well. Therefore, the unused floor area and units is somewhat of an overestimate. This unused floor area can be visualized in the plot below:

Similarly, the allowed additional units are mapped:

Finally, I map the potential unused stories. I note that when I first plotted this map, I recognized that there were two parcels accidentally designated at much more than 80 stories; therefore, these parcels have now been removed as part of the data cleaning procedure above.

From the plots above, it doesn’t appear that the parcels with more unused floor area or more additional units allowed aren’t clustered in any one geographic area, but rather spread throughout the neighborhood. That said, we can also see certain parcels with both a high amount of unused floor area and a high number of additional units allowed; these are often large parcels that are likely not heavily built-up.

There are also a few outliers that have a large amount of unused floor area and/or additional units. A quick investigation shows that these are tied to very large parcels (which are probably currently used for warehouses or something similar), and therefore do not appear to be mistakes in our calculations but rather a function of extremely large parcel sizes.

Overall, it appears that the Mission still has a substantial potential for growth as determined by the zoning, but not so much as to make a substantial change in the housing affordability issues in Silicon Valley or the Bay Area in general. Again, this analysis is somewhat of an overestimate as it doesn’t account for designated ground floor commercial space in the main business corridors, but I believe it’s close enough to lead to accurate conclusions. It is also important to consider if this development should happen in such a heavily-gentrified neighborhood, and if so, how it might be used to improve the lives of those already living in the area, instead of forcing them out of their homes. Housing, affordability, and land use are complex and highly interconnected challenges that the Bay Area must continue to grapple with as its tech economy shows no signs of slowing.